Proxying for Unobservable Variables with Internet Documentfrequency

نویسندگان

  • Albert Saiz
  • Uri Simonsohn
  • David Kwon
  • Caleb Li
چکیده

The internet contains billions of documents. We study if there is useful information in the frequency with which different topics are written about. Based on the premise that the occurrence of an event increases its textual frequency, we assess whether internet document-frequency can capture cross-sectional variation in the occurrence-frequency of social phenomena. We characterize the conditions under which such proxying is likely. We successfully proxy for a number of demographic variables at the US city and state levels. We obtain document-frequencybased measures of corruption at the country and state level and replicate the results of previous research studying its covariates. Finally, we illustrate the usefulness of the approach by creating the first index of corruption in US cities. JEL: H00, J11, C81, B40, D73

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Internet Group Management Protocol (IGMP) / Multicast Listener Discovery (MLD)-Based Multicast Forwarding ("IGMP/MLD Proxying")

Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract In certain topologies, it is not...

متن کامل

Managing energy consumption costs in desktop PCs and LAN switches with proxying, split TCP connections, and scaling of link speed

The IT equipment comprising the Internet in the USA uses about $6 billion of electricity every year. Much of this electricity use is wasted on idle, but fully powered-up, desktop PCs and network links. We show how to recover a large portion of the wasted electricity with improved power management methods that are focused on network issues.

متن کامل

TPOT: translucent proxying of TCP

Transparent Layer-4 proxies are being widely deployed in the current Internet to enable a vast variety of applications. These include Web proxy caching, transcoding, service differentiation, and load balancing. To ensure that all IP packets of an intercepted TCP connection are seen by the intercepting transparent proxy, they must sit at focal points in the network. Translucent Proxying of TCP (...

متن کامل

Downloading Wisdom from Online Crowds

Downloading Wisdom from Online Crowds The internet and other large textual databases contain billions of documents: is there useful information in the number of documents written about different topics? We propose, based on the premise that the occurrence of a phenomenon increases the likelihood that people write about it, that the relative frequency of documents discussing a phenomenon can be ...

متن کامل

A Survey on the Effective Socio-Cultural Factors on Internet Tendency among Shosh Payam Noor University Students

Nowadays internet is one of the main instruments for accessing to information. Social groups use different motivations, but this phenomenon has a special situation among students. Students in addition of amusement, fun and keen motivation to be familiar with unknown world, for educational-science purposes and finding job and educational choices are among main users of internet. The main purpose...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010